Introduction

The data-set I will be working with for this project is the white wine quality data-set provided by Udacity. The following tables will show the thirteen variables within the white wine data-sets names (I will add one variable), the structure of the variable fields, and a quantile summary of each variable field. As I explore this data I will be focusing on one major question; what makes a quality bottle of wine?

The following code was used to create a Rating variable. I did this for better grouping of data and easier viewing. Anything of quality 3, 4, or 5 is assigned ‘Bad’. 6 is Average. 7 is Good. 8 is Great. 9 is Excellent.

# Add a variable named "Rating" and assign text.
wineQuality$rating <- ifelse(wineQuality$quality <= 5, 'Bad', ifelse(
  wineQuality$quality < 7, 'Average', ifelse(
    wineQuality$quality < 8, 'Good', ifelse(
      wineQuality$quality < 9, 'Great', 'Excellent'))))
wineQuality$rating <- ordered(wineQuality$rating,
                       levels = c('Bad', 'Average', 'Good', 'Great', 
                                  'Excellent'))



The variables within our list and their names.

##  [1] "X"                    "fixed.acidity"        "volatile.acidity"    
##  [4] "citric.acid"          "residual.sugar"       "chlorides"           
##  [7] "free.sulfur.dioxide"  "total.sulfur.dioxide" "density"             
## [10] "pH"                   "sulphates"            "alcohol"             
## [13] "quality"              "rating"



The structure of the data frame. We have 4,898 objects with 14 variables, or columns, with the respective variables type listed as well.

## 'data.frame':    4898 obs. of  14 variables:
##  $ X                   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ fixed.acidity       : num  7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
##  $ volatile.acidity    : num  0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
##  $ citric.acid         : num  0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
##  $ residual.sugar      : num  20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
##  $ chlorides           : num  0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
##  $ free.sulfur.dioxide : num  45 14 30 47 47 30 30 45 14 28 ...
##  $ total.sulfur.dioxide: num  170 132 97 186 186 97 136 170 132 129 ...
##  $ density             : num  1.001 0.994 0.995 0.996 0.996 ...
##  $ pH                  : num  3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
##  $ sulphates           : num  0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
##  $ alcohol             : num  8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
##  $ quality             : int  6 6 6 6 6 6 6 6 6 6 ...
##  $ rating              : Ord.factor w/ 5 levels "Bad"<"Average"<..: 2 2 2 2 2 2 2 2 2 2 ...



Below I have a summary of the respective variables shown as quantiles. This data can give us some clues as to what is happening withing our data. For example, under residual.sugar and free.sulfur.dioxide, I have some potential outliers. This can be seen by comparing the min and max value as it relates to the mean. Since these outliers have not pulled the mean to far away from the median, I can assume that any outliers I have do not misrepresent the data. As this is tidy, I fortunately have no NA’s.

##        X        fixed.acidity    volatile.acidity  citric.acid    
##  Min.   :   1   Min.   : 3.800   Min.   :0.0800   Min.   :0.0000  
##  1st Qu.:1225   1st Qu.: 6.300   1st Qu.:0.2100   1st Qu.:0.2700  
##  Median :2450   Median : 6.800   Median :0.2600   Median :0.3200  
##  Mean   :2450   Mean   : 6.855   Mean   :0.2782   Mean   :0.3342  
##  3rd Qu.:3674   3rd Qu.: 7.300   3rd Qu.:0.3200   3rd Qu.:0.3900  
##  Max.   :4898   Max.   :14.200   Max.   :1.1000   Max.   :1.6600  
##  residual.sugar     chlorides       free.sulfur.dioxide total.sulfur.dioxide
##  Min.   : 0.600   Min.   :0.00900   Min.   :  2.00      Min.   :  9.0       
##  1st Qu.: 1.700   1st Qu.:0.03600   1st Qu.: 23.00      1st Qu.:108.0       
##  Median : 5.200   Median :0.04300   Median : 34.00      Median :134.0       
##  Mean   : 6.391   Mean   :0.04577   Mean   : 35.31      Mean   :138.4       
##  3rd Qu.: 9.900   3rd Qu.:0.05000   3rd Qu.: 46.00      3rd Qu.:167.0       
##  Max.   :65.800   Max.   :0.34600   Max.   :289.00      Max.   :440.0       
##     density             pH          sulphates         alcohol     
##  Min.   :0.9871   Min.   :2.720   Min.   :0.2200   Min.   : 8.00  
##  1st Qu.:0.9917   1st Qu.:3.090   1st Qu.:0.4100   1st Qu.: 9.50  
##  Median :0.9937   Median :3.180   Median :0.4700   Median :10.40  
##  Mean   :0.9940   Mean   :3.188   Mean   :0.4898   Mean   :10.51  
##  3rd Qu.:0.9961   3rd Qu.:3.280   3rd Qu.:0.5500   3rd Qu.:11.40  
##  Max.   :1.0390   Max.   :3.820   Max.   :1.0800   Max.   :14.20  
##     quality            rating    
##  Min.   :3.000   Bad      :1640  
##  1st Qu.:5.000   Average  :2198  
##  Median :6.000   Good     : 880  
##  Mean   :5.878   Great    : 175  
##  3rd Qu.:6.000   Excellent:   5  
##  Max.   :9.000



The below table shows wine count based on rating. Most wines tested on an average of 6(Average) but there are less wines over 6(Average) than below 6(Average).
This table below shows I only have 180 wines that would be considered great or excellent. This may help isolate what makes a quality glass of wine moving forward.

## 
##       Bad   Average      Good     Great Excellent 
##      1640      2198       880       175         5



Univariate Plots Section

Exploring the quality by number of samples.



Exploring the alcohol content by number of samples.



Exploring free sulfur dioxide by number of samples.



Exploring total sulfur dioxide by number of samples.



This table represents the five types of 9(Excellent) quality white wines. It is interesting that only five wines earned such recognition out of almost 5000.

X fixed.acidity volatile.acidity citric.acid residual.sugar chlorides free.sulfur.dioxide total.sulfur.dioxide density pH sulphates alcohol quality rating
775 775 9.1 0.27 0.45 10.6 0.035 28 124 0.99700 3.20 0.46 10.4 9 Excellent
821 821 6.6 0.36 0.29 1.6 0.021 24 85 0.98965 3.41 0.61 12.4 9 Excellent
828 828 7.4 0.24 0.36 2.0 0.031 27 139 0.99055 3.28 0.48 12.5 9 Excellent
877 877 6.9 0.36 0.34 4.2 0.018 57 119 0.98980 3.28 0.36 12.7 9 Excellent
1606 1606 7.1 0.26 0.49 2.2 0.032 31 113 0.99030 3.37 0.42 12.9 9 Excellent



Exploring pH, I found that as quality rises mean and median of pH also rise while the range between min and max decreases.

## wineQuality$rating: Bad
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.79    3.08    3.16    3.17    3.24    3.79 
## ------------------------------------------------------------ 
## wineQuality$rating: Average
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.720   3.080   3.180   3.189   3.280   3.810 
## ------------------------------------------------------------ 
## wineQuality$rating: Good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.840   3.100   3.200   3.214   3.320   3.820 
## ------------------------------------------------------------ 
## wineQuality$rating: Great
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.940   3.120   3.230   3.219   3.330   3.590 
## ------------------------------------------------------------ 
## wineQuality$rating: Excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.200   3.280   3.280   3.308   3.370   3.410



This table of free.sulfur.dioxide shows some interesting numbers worthy of further investigation. An excellent glass of white wine’s min free sulfur dioxide is notably higher than all other areas. As well, the max is significantly lower than other rating areas. This tight value range may have the best clues, thus far, as to what makes and excellent glass of white wine.

## wineQuality$rating: Bad
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00   20.00   34.00   35.34   49.00  289.00 
## ------------------------------------------------------------ 
## wineQuality$rating: Average
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00   24.00   34.00   35.65   46.00  112.00 
## ------------------------------------------------------------ 
## wineQuality$rating: Good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.00   25.00   33.00   34.13   41.00  108.00 
## ------------------------------------------------------------ 
## wineQuality$rating: Great
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.00   28.00   35.00   36.72   44.50  105.00 
## ------------------------------------------------------------ 
## wineQuality$rating: Excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    24.0    27.0    28.0    33.4    31.0    57.0



This table of total sulfur dioxide as it relates to quality is interesting. Excellent glasses of white wine have a much higher min amount than others but with a smaller max amount than the other qualities as well. This range seems very specific to higher quality white wine.

## wineQuality$rating: Bad
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     9.0   117.0   149.0   148.6   182.0   440.0 
## ------------------------------------------------------------ 
## wineQuality$rating: Average
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    18.0   107.2   132.0   137.0   164.0   294.0 
## ------------------------------------------------------------ 
## wineQuality$rating: Good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    34.0   101.0   122.0   125.1   144.2   229.0 
## ------------------------------------------------------------ 
## wineQuality$rating: Great
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    59.0   102.5   122.0   126.2   150.0   212.5 
## ------------------------------------------------------------ 
## wineQuality$rating: Excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      85     113     119     116     124     139



I can see the same trend in this table of alcohol content, as quality rises, so does alcohol content. This is interesting because alcohol content is decided by other variables which are present, such as sugars, and I didn’t think one was supposed to swallow the wine when considering quality!

## wineQuality$rating: Bad
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.00    9.20    9.60    9.85   10.40   13.60 
## ------------------------------------------------------------ 
## wineQuality$rating: Average
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.50    9.60   10.50   10.58   11.40   14.00 
## ------------------------------------------------------------ 
## wineQuality$rating: Good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.60   10.60   11.40   11.37   12.30   14.20 
## ------------------------------------------------------------ 
## wineQuality$rating: Great
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.50   11.00   12.00   11.64   12.60   14.00 
## ------------------------------------------------------------ 
## wineQuality$rating: Excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.40   12.40   12.50   12.18   12.70   12.90



Univariate Analysis

What is the structure of your dataset?

I have 4,898 objects with 14 variables, or columns. This is a data-set downloaded from Udacity and is extremely tidy. Each value in our variables is of type num except quality which in of type int. The “X” variable represents wine name and there is no intention to do any coding including this variable.

What is/are the main feature(s) of interest in your dataset?

For this project I am interested in what defines a quality glass of wine. I can see that alcohol content, sulfur dioxide, pH, and density have interesting trends, but I will need to compare these variable further to determine their true interest.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

I believe the variables that have the most dynamic range will help in isolating what variable makes a quality wine glass. Having potential outliers in several variable fields will be the starting point. The variables that have a very small range, such as density, may not yield as much meaningful insight. But, as I discovered while further exploring, the range of numbers is not an indicator of usefulness.

Did you create any new variables from existing variables in the dataset?

Yes, I created the variable “rating”. I found it useful to add a text phrase to the quality variable. I did this for better grouping of data by quality and for a better viewing experience. This made the data more meaningful versus numbers as a metric for quality.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

The observed unusual distributions are in the residual.sugar, free.sulfur.dioxide, and total.sulfur.dioxide. I call these unusual due to how far off the max value is from the mean. Though they are unusual, as can be seen from the univariate charts, these variables seem to play some part in a high quality glass of white wine. I believe as we move forward, the answer will require the merging and viewing of several variable data points in a single chart to the relations of what makes a high quality glass of wine.

Bivariate Plots Section

Scatterplot



Histograms of alcohol content as it relates to quality. Each chart is a quality/rating level from 3(Bad) to 9(Excellent) showing alcohol content and the portions for the respective quality.



Histograms of residual sugar content as it relates to quality/rating. Each chart has a quality level from 3(Bad) to 9(Excellent) showing residual sugar and the portions for the respective quality/rating. As I am interested in quality wine I have subsetted this chart according to quality/rating greater than 7(Average), or the mean of quality.



These histograms show the relation of free sulfur dioxide as it relates to wine quality. Each chart has a quality level from 3(Bad) to 9(Excellent) showing free sulfur dioxide and the portions for the respective quality. As I am interested in quality wine I have subsetted this chart according to quality/rating greater than 6(Average), or the mean of quality.



These histograms show the relation of total sulfur dioxide as it relates to wine quality. Each chart has a quality level from 3(Bad) to 9(Excellent6 showing total sulfur dioxide and the portions for the respective quality. As I am interested in quality wine I have subsetted this chart according to quality/rating greater than 6(Average), or the mean of quality.



This chart shows how the range of total sulfur dioxide shrinks as quality or rating increases.



As above, this chart shows the shrinking range of free sulfur dioxide as quality or rating increases.



An interesting trend is developing, as this pH table shows, the range of pH shrinks as quality or rating increases. This is reinforcing the thought that a fine median needs to be achieved for several variables to achieve quality.



Here we can see that as alcohol (%) content rises the quality trends up as well. The process for higher alcohol content must not be easy or we would see many more Bad rating wines attempt to make up for low quality with high alcohol. Since higher alcohol content is related to quality, it may be a good idea.



Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the data-set?

It was not surprising seeing the correlation between quality and alcohol content but it was surprising that alcohol content was the only variable that showed any possible correlation to quality outright. I suspect that quality is a product of more than two variables. Sulfur dioxide, both free and total, seem to play what may be the biggest part in white wines.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

It was observed that density is closely related to total sulfur dioxide and residual sugar. Density, however, was not closely correlated with quality. This is interesting data points but seem to be only closely related to quality.

What was the strongest relationship you found?

The strongest relationship was between residual sugar and density with a Pearson’s R of 0.8389665 and 95% confidence interval. This makes perfect sense. As with a cup of hot tea, the more sugar added even after dissolved, clouds the tea. Most white wines trended towards lower residual sugar levels but still maintained high density. This would be due to total sulfur dioxide also having a strong correlation (0.5298813) to density as well. So two variable work at changing the density variable of white wine.

Multivariate Plots Section

Here, I create a new table grouped by rating and free sulfur dioxide while showing the mean and median of total sulfur dioxide.

wineQuality.tsd_by_rating_fsd <- wineQuality %>%
  group_by(rating, free.sulfur.dioxide) %>%
  summarise(mean_tsd = mean(total.sulfur.dioxide),
            .groups = "drop",
            median_tsd = median(total.sulfur.dioxide),
            n = n()) %>%
  arrange(rating)



The relationship between free and total sulfur dioxide is interesting. The mean and median total sulfur dioxide are equal while free sulfur dioxide also float in the perfect mean of free sulfur dioxide for and Excellent rating. As seen in the following tables, the Excellent rated wines are perfect with mean and median while within free sulfur dioxide range of 24 and 57. But, the other rated wines do not quite share these qualities. What is also interesting, take notice that if the mean and median of total sulfur dioxide are equal while within the free sulfur dioxide range of 24 and 57, the count is either 1 or 2.

rating free.sulfur.dioxide mean_tsd median_tsd n
Average 24.0 115.4194 112.5 62
Average 25.0 116.1509 113.0 53
Average 26.0 117.6232 111.0 69
Average 27.0 120.8511 119.0 47
Average 28.0 126.5098 122.0 51
Average 29.0 124.8361 122.0 61
Average 30.0 124.5714 117.0 49
Average 31.0 126.9627 117.0 67
Average 32.0 131.6852 121.5 54
Average 33.0 137.3788 136.0 66
Average 34.0 134.7547 132.0 53
Average 35.0 139.3333 138.0 45
Average 36.0 130.9296 122.0 71
Average 37.0 140.2241 130.0 58
Average 38.0 137.9216 131.0 51
Average 38.5 245.0000 245.0 1
Average 39.0 154.5581 151.0 43
Average 39.5 216.5000 216.5 1
Average 40.0 139.8611 141.0 36
Average 41.0 147.5455 142.5 44
Average 42.0 148.0408 146.0 49
Average 43.0 147.5714 142.5 28
Average 44.0 152.5278 161.0 36
Average 44.5 234.0000 234.0 2
Average 45.0 145.3056 140.5 36
Average 46.0 156.7826 149.0 23
Average 47.0 154.2553 156.0 47
Average 48.0 173.3448 180.0 29
Average 49.0 158.2051 159.0 39
Average 50.0 161.7000 156.0 30
Average 51.0 176.7667 170.0 30
Average 52.0 173.2432 177.0 37
Average 52.5 113.0000 113.0 2
Average 53.0 173.2973 166.0 37
Average 54.0 171.3636 167.0 22
Average 55.0 180.2857 190.0 21
Average 56.0 173.6154 177.0 26
Average 57.0 169.7727 168.0 22
rating free.sulfur.dioxide mean_tsd median_tsd n
Good 24.0 108.0625 99.5 16
Good 25.0 125.1724 133.0 29
Good 26.0 117.8214 114.0 28
Good 27.0 108.9444 100.5 18
Good 28.0 110.1739 110.0 23
Good 29.0 124.8909 121.0 55
Good 30.0 113.7727 112.5 22
Good 31.0 111.3333 113.0 27
Good 32.0 134.7083 136.0 24
Good 33.0 133.7778 125.0 27
Good 34.0 116.8077 114.5 26
Good 35.0 140.3750 128.0 40
Good 36.0 120.3913 122.0 23
Good 37.0 124.0000 110.5 20
Good 38.0 124.5455 119.5 22
Good 39.0 125.6500 122.5 20
Good 40.0 124.5294 119.0 34
Good 41.0 136.3929 142.0 28
Good 41.5 195.0000 195.0 2
Good 42.0 120.3846 120.0 13
Good 43.0 129.2222 132.0 9
Good 44.0 134.3000 136.0 10
Good 44.5 129.5000 129.5 2
Good 45.0 137.3913 138.0 23
Good 46.0 138.1176 143.0 17
Good 47.0 142.9286 134.5 14
Good 48.0 148.6000 139.5 10
Good 48.5 226.5714 229.0 7
Good 49.0 160.4615 164.0 13
Good 50.0 150.3333 149.0 9
Good 51.0 141.8333 130.5 6
Good 52.0 151.5000 158.0 6
Good 52.5 158.0000 158.0 1
Good 53.0 149.6667 143.0 6
Good 54.0 134.4000 128.0 5
Good 55.0 164.0909 149.0 11
Good 56.0 152.5000 152.5 2
Good 57.0 154.0000 156.0 3
rating free.sulfur.dioxide mean_tsd median_tsd n
Great 24 112.6667 125.0 3
Great 25 111.5000 111.5 2
Great 26 112.6667 109.0 3
Great 27 94.0000 94.0 2
Great 28 103.2500 97.0 4
Great 29 114.0000 118.0 11
Great 30 133.3750 117.0 8
Great 31 117.1429 119.0 7
Great 32 136.2500 136.5 4
Great 33 108.5000 108.5 2
Great 34 115.5000 112.0 8
Great 35 134.0000 135.0 3
Great 36 108.7500 113.5 4
Great 37 116.6000 122.0 10
Great 38 130.0000 132.0 4
Great 39 131.8333 123.5 6
Great 40 104.0000 104.0 1
Great 41 111.8000 98.0 5
Great 42 154.5000 154.5 2
Great 43 137.7778 145.0 9
Great 44 137.0000 137.0 1
Great 45 154.1111 155.0 9
Great 46 132.7500 132.5 4
Great 48 114.0000 114.0 1
Great 49 150.7500 144.5 4
Great 50 151.5000 151.5 2
Great 51 165.0000 165.0 1
Great 53 212.5000 212.5 6
Great 54 156.3333 155.0 3
Great 56 140.0000 140.0 2
rating free.sulfur.dioxide mean_tsd median_tsd n
Excellent 24 85 85 1
Excellent 27 139 139 1
Excellent 28 124 124 1
Excellent 31 113 113 1
Excellent 57 119 119 1



“In winemaking, the use of sulfur dioxide (SO2) is critical. We tend to talk a lot about free SO2 (FSO2) in particular, and not without good reason. The FSO2 and the pH of your wine determine how much SO2 is available in the active, molecular form to help protect the wine from oxidation and spoilage. FSO2 is also something we have to keep a close eye on, because it can be hard to predict how much will be lost, and at what rate, to binding or to aeration. Too much FSO2 can be perceptible to consumers, by masking the wine’s own fruity aromas and inhibiting its ability to undergo the cascade of oxygen-using reactions that happen when it “breathes,” or, in high enough concentrations, by contributing a sharp/bitter/metallic/chemical flavor or sensation."

Moroney, M. (2018, February 27). Total sulfur dioxide - why it matters, too! Retrieved March 04, 2021, from https://www.extension.iastate.edu/wine/total-sulfur-dioxide-why-it-matters-too

This confirms what I believe to be seeing but I do not believe to be the total story. Sulfur dioxide seems to set the tone for the senses, such as smell and taste but I also want to unlock what relation alcohol content plays in quality or if it’s something of a coincidence. Otherwise put, do the same qualities that make a white wine Great or Excellent, coincidentally also make the wine rich in alcohol content or do the folks in the sample just prefer to get “drunker, quicker”.

I also create a new data set for total and free sulfur dioxide by alcohol content.

wineQuality.tsd_by_alcohol <- wineQuality %>%
  group_by(alcohol, free.sulfur.dioxide, rating) %>%
  summarise(mean_tsd = mean(total.sulfur.dioxide),
            .groups = "drop",
            median_tsd = median(total.sulfur.dioxide),
            n = n()) %>%
  arrange(alcohol)



Here we also see the equal relationship between mean and median while the free sulfur dioxide also, mostly, floats within the tight range of 24 to 57, which is the Excellent quantile range for free sulfur dioxide examined earlier. Interestingly, just as there are only five 9(Excellent) rating wines, the highest alcohol content white wines share the same characteristics observed for 9(Excellent) white wines.

alcohol free.sulfur.dioxide rating mean_tsd median_tsd n
14.00 12 Average 88 88 1
14.00 12 Good 120 120 1
14.00 33 Good 106 106 2
14.00 39 Great 150 150 1
14.05 31 Good 104 104 1
14.20 31 Good 113 113 1



Will examine the top four correlations.

Residual sugar and density.

## 
##  Pearson's product-moment correlation
## 
## data:  wineQuality$residual.sugar and wineQuality$density
## t = 107.87, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8304732 0.8470698
## sample estimates:
##       cor 
## 0.8389665



Total sulfur dioxide and free sulfur dioxide.

## 
##  Pearson's product-moment correlation
## 
## data:  wineQuality$total.sulfur.dioxide and wineQuality$free.sulfur.dioxide
## t = 54.645, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5977994 0.6326026
## sample estimates:
##      cor 
## 0.615501



Total sulfur dioxide and density.

## 
##  Pearson's product-moment correlation
## 
## data:  wineQuality$total.sulfur.dioxide and wineQuality$density
## t = 43.719, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5094349 0.5497297
## sample estimates:
##       cor 
## 0.5298813



Alcohol and quality.

## 
##  Pearson's product-moment correlation
## 
## data:  wineQuality$alcohol and wineQuality$quality
## t = 33.858, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4126015 0.4579941
## sample estimates:
##       cor 
## 0.4355747



This table attempts to show the relation between free and total sulfur dioxides while showing how the window for quality decreases to a smaller window through box-plots.



Showing the relation of total sulfur dioxide to alcohol content while showing quality ranges through boxplots. The amount of total sulfur dioxide is very specific when achieving quality.



Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

Total sulfur dioxide and free sulfur dioxide are very key in creating an Excellent glass of wine. They are closely linked and there is a very tight range for perfection. Generally, total sulfur dioxide must be within 85 and 39. While free sulfur dioxide must be within 24 and 57. This only makes a Good glass. Excellence is achieved by hitting the median in both of those respective variables.

Were there any interesting or surprising interactions between features?

Alcohol content is interesting. I do not believe it to be a coincidence that alcohol is higher in better quality wine. This is a very delicate balance between chemical variables that requires masterful skill. I say this because of almost 5000 samples, only 5 made Excellent. If one can achieve that Excellent rating, then the chemical variables will interact perfectly, which will show in every aspect of the wine, including alcohol. I will speak more on this in reflection.


Final Plots and Summary

Plot One

Description One

I found this interesting because this represents an avenue I did not venture down. I see the levels of pH change in a very noticeable way as quality goes up. This would make me think there is also an attachment of pH to quality as I see the range for 9(Excellent) is very small.

Plot Two

Description Two

This chart reinforces the connection between total and free sulfur dioxide and how the ranges decrease as quality goes up. A very specific range, or balance, is needed of free and total sulfur dioxides to achieve quality. Even the outliers range shortens as quality rises creating less chances of achieving quality if the balance is not maintained.

Plot Three

Description Three

This chart reinforced my thought that alcohol content is a by product of quality. This chart, I feel, embodies the complex chemistry it takes to make an excellent glass of white wine, as well. In making a “Bad” glass of wine, the chemical equation has broken down and it makes sense that the same chemical equation would not achieve higher alcohol content as alcohol content.


Reflection

It occurred to me, where the reference is in the report, that I don’t know any where near enough about what exactly merits an excellent glass of wine. I Googled, I read, sulfur dioxide is important, as I found, but it’s the process that is really key. That process with time variables measured against these same chemical variables would, likely, yield more useful data in determining what makes quality. With that said, I think before one could have an understanding of how to make usefulness of this data, you need to understand much about the white wine making process.

Alcohol content was surprising. I was expecting to find more correlation than only to quality. Mostly I was expecting some link to residual sugar as I read sugars play a part in fermentation. My assumption is that as residual, it is not relating to alcohol. Perhaps if I knew how much was used in creation, I could find a correlation to alcohol content but residual sugar seems to be what was left over from the fermentation process. This also makes sense as to why residual sugar does correlate to density. I believe there may have been more interesting data points to explore here!